On Optimal Probabilities in Stochastic Coordinate Descent Methods

نویسندگان

  • Peter Richtárik
  • Martin Takác
چکیده

We propose and analyze a new parallel coordinate descent method—‘NSync— in which at each iteration a random subset of coordinates is updated, in parallel, allowing for the subsets to be chosen non-uniformly. We derive convergence rates under a strong convexity assumption, and comment on how to assign probabilities to the sets to optimize the bound. The complexity and practical performance of the method can outperform its uniform variant by an order of magnitude. Surprisingly, the strategy of updating a single randomly selected coordinate per iteration—with optimal probabilities—may require less iterations, both in theory and practice, than the strategy of updating all coordinates at every iteration.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Adaptive Probabilities in Stochastic Optimization Algorithms

Stochastic optimization methods have been extensively studied in recent years. In some classification scenarios such as text document categorization, unbiased methods such as uniform sampling have negative effects on the convergence rate, because of the effects of the potential outlier data points on the estimator. Consequently, it would take more iterations to converge to the optimal value for...

متن کامل

Faster Optimization through Adaptive Importance Sampling

The current state of the art stochastic optimization algorithms (SGD, SVRG, SCD, SDCA, etc.) are based on sampling one active datapoint uniformly at random in each iteration. Changing these probabilities to better reflect the importance of each datapoint is a natural and powerful idea. In this thesis we analyze Stochastic Coordinate Descent methods with fixed non-uniform and adaptive sampling. ...

متن کامل

Adaptive Sampling Probabilities for Non-Smooth Optimization

Abstract Standard forms of coordinate and stochastic gradient methods do not adapt to structure in data; their good behavior under random sampling is predicated on uniformity in data. When gradients in certain blocks of features (for coordinate descent) or examples (for SGD) are larger than others, there is a natural structure that can be exploited for quicker convergence. Yet adaptive variants...

متن کامل

Trading Computation for Communication: Distributed Stochastic Dual Coordinate Ascent

We present and study a distributed optimization algorithm by employing a stochastic dual coordinate ascent method. Stochastic dual coordinate ascent methods enjoy strong theoretical guarantees and often have better performances than stochastic gradient descent methods in optimizing regularized loss minimization problems. It still lacks of efforts in studying them in a distributed framework. We ...

متن کامل

Optimal quantization methods and applications to numerical problems in finance

We review optimal quantization methods for numerically solving nonlinear problems in higher dimension associated with Markov processes. Quantization of a Markov process consists in a spatial discretization on finite grids optimally fitted to the dynamics of the process. Two quantization methods are proposed: the first one, called marginal quantization, relies on an optimal approximation of the ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Optimization Letters

دوره 10  شماره 

صفحات  -

تاریخ انتشار 2016